Skip to content

DAOS-18727 rsvc: Fix add_replicas_s error handling (#18536)#18607

Draft
liw wants to merge 1 commit into
release/2.8from
liw/add-replicas-notleader-2.8
Draft

DAOS-18727 rsvc: Fix add_replicas_s error handling (#18536)#18607
liw wants to merge 1 commit into
release/2.8from
liw/add-replicas-notleader-2.8

Conversation

@liw

@liw liw commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

During a PS reconfiguration, inside ds_rsvc_add_replicas_s, if the rdb_modify_replicas call returns -DER_NOTLEADER, the new replica may have been added to the local membership, but ds_rsvc_add_replicas_s destroys it. This missing member may render the PS unavailable, especially after a rdb_dictate call.

This patch changes the error handling to destroy the new replica only if it hasn't been added the local membership. (We add it locally before adding it remotely.)

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

During a PS reconfiguration, inside ds_rsvc_add_replicas_s, if the
rdb_modify_replicas call returns -DER_NOTLEADER, the new replica may
have been added to the local membership, but ds_rsvc_add_replicas_s
destroys it. This missing member may render the PS unavailable,
especially after a rdb_dictate call.

This patch changes the error handling to destroy the new replica only if
it hasn't been added the local membership. (We add it locally before
adding it remotely.)

Signed-off-by: Li Wei <liwei@hpe.com>
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Ticket title is './recovery/pool_list_consolidation.py:PoolListConsolidationTest.test_lost_majority_ps_replicas - rdb-pool are recovered, three out of four ranks should have rdb-pool'
Status is 'Awaiting backport'
Labels: 'ci_master_daily,daily_test'
Job should run at elevated priority (1)
https://daosio.atlassian.net/browse/DAOS-18727

@github-actions github-actions Bot added the priority Ticket has high priority (automatically managed) label Jul 2, 2026
@liw liw added the clean-cherry-pick Cherry-pick from another branch that did not require additional edits label Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clean-cherry-pick Cherry-pick from another branch that did not require additional edits priority Ticket has high priority (automatically managed)

Development

Successfully merging this pull request may close these issues.

1 participant